Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49269][SQL] Eagerly evaluate VALUES() list in AstBuilder #47791

Conversation

costas-db
Copy link
Contributor

@costas-db costas-db commented Aug 16, 2024

What changes were proposed in this pull request?

This is a continuation of a prior performance improvement: #47428 that eagerly evaluates memory-heavy UnresolvedUnlineTables parse tree nodes as soon as they are constructed in the AstBuilder.

This PR applies this optimization to any statement that might contain one or more VALUES() clauses (such as subqueries etc), instead of just applying that optimization to INSERT INTO ... VALUES statements, which is what the prior PR did.

Why are the changes needed?

With these changes we not only reduce the memory footprint of every statement that can contain the VALUES() clause, but we also improve upon the previous optimization as we avoid unnecessary traversals of the parse tree, which not only improves the runtime performance, but also minimizes the amount of time in which the UnresolvedInlineTable is kept in memory.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Provided scala tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Aug 16, 2024
@costas-db costas-db changed the title [SQL] Eagerly evaluate VALUES() list in AstBuilder [WIP][SQL] Eagerly evaluate VALUES() list in AstBuilder Aug 16, 2024
@costas-db costas-db changed the title [WIP][SQL] Eagerly evaluate VALUES() list in AstBuilder [WIP][SPARK-49269][SQL] Eagerly evaluate VALUES() list in AstBuilder Aug 16, 2024
@costas-db costas-db changed the title [WIP][SPARK-49269][SQL] Eagerly evaluate VALUES() list in AstBuilder [SPARK-49269][SQL] Eagerly evaluate VALUES() list in AstBuilder Aug 16, 2024
@cloud-fan
Copy link
Contributor

Ill merge after CI passes

@@ -226,4 +287,72 @@ class InlineTableParsingImprovementsSuite extends QueryTest with SharedSparkSess
}
}
}

test("Value list in subquery") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a JIRA prefix in the test title

Suggested change
test("Value list in subquery") {
test("SPARK-49269: Value list in subquery") {

}
}

test("Value list in projection list subquery") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

@HyukjinKwon
Copy link
Member

Merged to master.

IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
### What changes were proposed in this pull request?
This is a continuation of a prior performance improvement: apache#47428 that eagerly evaluates memory-heavy `UnresolvedUnlineTables` parse tree nodes as soon as they are constructed in the AstBuilder.

This PR applies this optimization to any statement that might contain one or more `VALUES()` clauses (such as subqueries etc), instead of just applying that optimization to `INSERT INTO ... VALUES` statements, which is what the prior PR did.

### Why are the changes needed?
With these changes we not only reduce the memory footprint of every statement that can contain the `VALUES()` clause, but we also improve upon the previous optimization as we avoid unnecessary traversals of the parse tree, which not only improves the runtime performance, but also minimizes the amount of time in which the `UnresolvedInlineTable` is kept in memory.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Provided scala tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47791 from costas-db/eagerlyEvaluateUnresolvedInlineTableInAstBuilder.

Authored-by: Costas Zarifis <costas.zarifis@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
This is a continuation of a prior performance improvement: apache#47428 that eagerly evaluates memory-heavy `UnresolvedUnlineTables` parse tree nodes as soon as they are constructed in the AstBuilder.

This PR applies this optimization to any statement that might contain one or more `VALUES()` clauses (such as subqueries etc), instead of just applying that optimization to `INSERT INTO ... VALUES` statements, which is what the prior PR did.

### Why are the changes needed?
With these changes we not only reduce the memory footprint of every statement that can contain the `VALUES()` clause, but we also improve upon the previous optimization as we avoid unnecessary traversals of the parse tree, which not only improves the runtime performance, but also minimizes the amount of time in which the `UnresolvedInlineTable` is kept in memory.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Provided scala tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47791 from costas-db/eagerlyEvaluateUnresolvedInlineTableInAstBuilder.

Authored-by: Costas Zarifis <costas.zarifis@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants